Zero-width Non-joiner
   HOME

TheInfoList



OR:

The zero-width non-joiner (ZWNJ) is a non-printing character used in the computerization of
writing system A writing system is a method of visually representing verbal communication, based on a script and a set of rules regulating its use. While both writing and speech are useful in conveying messages, writing differs in also being a reliable form ...
s that make use of ligatures. When placed between two characters that would otherwise be connected into a ligature, a ZWNJ causes them to be printed in their final and initial forms, respectively. This is also an effect of a
space character In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area ...
, but a ZWNJ is used when it is desirable to keep the characters closer together or to connect a word with its morpheme. The ZWNJ is encoded in
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
as .


Use of ZWNJ and unit separator for correct typography

In certain languages, the ZWNJ is necessary for unambiguously specifying the correct typographic form of a character sequence. The ASCII control code
unit separator The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, ...
was formerly used. The picture shows how the code looks when it is ''rendered'' correctly, and in every row the correct and incorrect pictures should be different. On a system which not configured to display the Unicode correctly, the correct display and the incorrect one may look the same, or either of them may be significantly different from the corresponding picture. In this
Biblical Hebrew Biblical Hebrew (, or , ), also called Classical Hebrew, is an archaic form of the Hebrew language, a language in the Canaanite branch of Semitic languages spoken by the Israelites in the area known as the Land of Israel, roughly west of ...
example, the placement of the to the left of the is correct, which has a sign written as two vertical dots to denote short vowel. If a were placed to the left of , it would be erroneous. In
Modern Hebrew Modern Hebrew ( he, עברית חדשה, ''ʿivrít ḥadašá ', , '' lit.'' "Modern Hebrew" or "New Hebrew"), also known as Israeli Hebrew or Israeli, and generally referred to by speakers simply as Hebrew ( ), is the standard form of the He ...
, there is no reason to use the for spoken language, so it is rarely used in Modern Hebrew typesetting. In German typography, ligatures may not cross the constituent boundaries within compounds. Thus, in the first German example, the prefix is separated from the rest of the word to prohibit the ligature ''fl''. Similarly, in English, some argue ligatures should not cross
morpheme A morpheme is the smallest meaningful Constituent (linguistics), constituent of a linguistic expression. The field of linguistics, linguistic study dedicated to morphemes is called morphology (linguistics), morphology. In English, morphemes are ...
boundaries. For example, in some words 'fly' and 'fish' are morphemes but in others they're not; therefore, by their reasoning, words like 'deaf‌ly' and 'self‌ish' (here shown with the non-joiner) should not have ligatures (respectively of fl and fi) while 'dayfly' and 'catfish' should have them. Persian uses this character extensively for certain prefixes, suffixes and compound words. It is necessary for disambiguating compounds from non-compound words, which use a full space. In the
Jawi script Jawi (; ace, Jawoë; Kelantan-Pattani Malay, Kelantan-Pattani: ''Yawi''; ) is a writing system used for writing several languages of Southeast Asia, such as Acehnese language, Acehnese, Banjar language, Banjarese, Kerinci language, Kerinci, ...
of
Malay Malay may refer to: Languages * Malay language or Bahasa Melayu, a major Austronesian language spoken in Indonesia, Malaysia, Brunei and Singapore ** History of the Malay language, the Malay language from the 4th to the 14th century ** Indonesi ...
, ZWNJ is used whenever more than one consonants are written at the end of any phrase (, Malay for 'science' or in Latin script, pronounced /ˈsa.ɪns/.) It is used to signify that there are no vowels (specifically 'a' or 'ə') in between the two consonant letters as would otherwise be pronounced either /ˈsa.ɪnas/ or /ˈsa.ɪnəs/. A space would separate the phrase into different words, where phrases such as would now mean 'to sign the Arabic letter
sin In a religious context, sin is a transgression against divine law. Each culture has its own interpretation of what it means to commit a sin. While sins are generally considered actions, any thought, word, or act considered immoral, selfish, s ...
' ( in Latin script.)


Use of ZWNJ to display alternative forms

In Indic scripts, insertion of a ZWNJ after a consonant either with a
halant Virama ( ्) is a Sanskrit phonological concept to suppress the inherent vowel that otherwise occurs with every consonant letter, commonly used as a generic term for a codepoint in Unicode, representing either # halanta, hasanta or explicit virā ...
or before a dependent vowel prevents the characters from being joined properly: In
Devanagari Devanagari ( ; , , Sanskrit pronunciation: ), also called Nagari (),Kathleen Kuiper (2010), The Culture of India, New York: The Rosen Publishing Group, , page 83 is a left-to-right abugida (a type of segmental Writing systems#Segmental syste ...
, the characters and typically combine to form , but when a ZWNJ is inserted between them, (code: क्‌ष) is seen instead. In
Kannada Kannada (; ಕನ್ನಡ, ), originally romanised Canarese, is a Dravidian language spoken predominantly by the people of Karnataka in southwestern India, with minorities in all neighbouring states. It has around 47 million native s ...
, the characters ನ್ and ನ combine to form ನ್ನ, but when a ZWNJ is inserted between them, ನ್‌ನ is displayed. That style is typically used to write non-Kannada words in Kannada script: "
Facebook Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin M ...
" is written as ಫೇಸ್‌ಬುಕ್, though it can be written as ಫೇಸ್ಬುಕ್. ರಾಜ್‌ಕುಮಾರ್ and ರಾಮ್‌ಗೊಪಾಲ್ are examples of other proper nouns that need ZWNJ. In
Bengali Bengali or Bengalee, or Bengalese may refer to: *something of, from, or related to Bengal, a large region in South Asia * Bengalis, an ethnic and linguistic group of the region * Bengali language, the language they speak ** Bengali alphabet, the w ...
, when the Bengali letter য occurs at the end of a consonant cluster—i.e., য preceded by a ◌্ ('' hôsôntô'')—it appears in a special shape, , known as the য-ফলা (''ja-phala''), such as in ক্য (ক ্ য). However, when the Bengali letter র occurs at the beginning of a consonant cluster—i.e., র succeeded by a ''hôsôntô''—it appears in a special shape, known as the রেফ (''reph''). Thus, the sequence র ্ য is rendered by default as র্য. When the য-ফলা shape needs to be retained rather than the রেফ shape, the ZWNJ is inserted right after র, i.e., র‌্য to render র‌্য. র‌্য is commonly used for loanwords from English such as র‍্যান্ডম (random). Words like উদ্‌ঘাটন (code: উদ্‌ঘাটন) where the ''hôsôntô'' needs to be displayed explicitly also require ZWNJ inserted after the ''hôsôntô''.


Symbol

The symbol to be used on keyboards which enable the input of the ZWNJ directly is standardized in Amendment 1 (2012) of
ISO/IEC 9995 ISO/IEC 9995 ''Information technology — Keyboard layouts for text and office systems'' is an ISO/IEC standard series defining layout principles for computer keyboards. It does not define specific layouts but provides the base for national and ind ...
-7:2009 ''"Information technology – Keyboard layouts for text and office systems – Symbols used to represent functions"'' as symbol number 81, and in
IEC The International Electrotechnical Commission (IEC; in French: ''Commission électrotechnique internationale'') is an international standards organization that prepares and publishes international standards for all electrical, electronic and r ...
60417 ''"Graphical Symbols for use on Equipment"'' as symbol no. IEC 60417-6177-2.


See also

*
Zero-width joiner The zero-width joiner (ZWJ, ) is a non-printing character used in the computerized typesetting of writing systems in which the shape or positioning of a grapheme depends on its relation to other graphemes ( complex scripts), such as the Arabic s ...
*
Zero-width space The zero-width space , abbreviated ZWSP, is a non-printing character used in computerized typesetting to indicate word boundaries to text-processing systems in scripts that do not use explicit spacing, or after characters (such as the slash) that a ...
*
Word divider In punctuation, a word divider is a glyph that separates written words. In languages which use the Latin, Cyrillic, and Arabic alphabets, as well as other scripts of Europe and West Asia, the word divider is a blank space, or ''whitespace''. T ...


References


External links


Using the ZWNJ in Persian


/nowiki> JOINER)] {{Unicode navigation Control characters Persian orthography Typography Unicode formatting code points